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We analyze the effectiveness of working in pairs on the Conceptual Survey of 
Electricity and Magnetism test in a calculus-based introductory physics course. The 
group performance shows large normalized gain and evidence for co-construction. We 
discuss the effect of pairing students with different individual achievements. 



1 Introduction 

Peer collaboration as a learning tool has 
been exploited in many diverse instruc- 
tional settings and with different types and 
levels of student populations. Here we in- 
vestigate the effectiveness of working in 
pairs on the performance on a standard- 
ized conceptual multiple-choice test, Con- 
ceptual Survey of Electricity and Mag- 
netism (CSEM), in a calculus-based intro- 
ductory physics course. In a double class 
period (1 hour 50 minutes), students were 
administered the test individually and in 
groups of two after instruction in the rele- 
vant concepts. In both instances (whether 
group or individual) , students were allowed 
50 minutes. Between the individual and 
group testing there was a short 5-10 minute 
break, and students were required to turn 
in their first response so that they could not 
refer to it when working in group. The test 
answers were never discussed with students 
so when they switched from individual to 
group (or vice versa), they did not know if 
their initial responses were correct. 

Although some studies show that het- 
erogenous groups are more effective for 
group learning, others show that working 
with friends has special advantages. In 
this study, students were allowed to choose 
their own partners. They were encouraged 
to discuss the response with each other, 
and each test counted for one quiz grade. 
Students had an additional incentive to dis- 
cuss the concepts, because an examination 
in two weeks covered the same material. 
Moreover, students had extensive experi- 
ence working with two peers in the recita- 
tion on context-rich problems and with one 
peer during the lecture on Mazur-style con- 



cept tests. We note that the peer collabo- 
ration was unguided in that there was no 
help or facilitation from the instructor. 

2 Discussion 

To obtain two random equivalent samples, 
all students in the class sitting on one side 
of the aisle took the individual test first, 
followed by the group test (IG treatment: 
74 students or 37 pairs) while those on the 
other side of the aisle took the group test 
first before taking it individually (GI treat- 
ment: 54 students or 27 pairs). One reason 
for giving the test in both orders was to as- 
sess the effect of thinking individually be- 
fore the peer discussion. In the Mazur-style 
peer instruction, students are first asked to 
think about the concepts based on the as- 
sumption that not allowing an opportunity 
to think individually may prevent students 
from evaluating their own stand on a con- 
cept. Another reason for designing both 
the IG and GI treatments was to evaluate 
the "test-retest" or "practice" effect. 

Although the trends in some individual 
test items are interesting, here we only dis- 
cuss the effect of group work on the overall 
test scores. In Table 1, we show the aver- 
age individual and group scores for the GI 
and IG treatments and for the IG treat- 
ment, also the average normalized gain, 
g = {{sf — Sj)/(100 — Si)), where Si and 
Sf are the earlier (individual) and later 
(group) test scores in percent. The fact 
that the group performance on the IG and 
GI treatments are virtually indistinguish- 
able suggests that giving students an op- 
portunity to think individually before the 
peer discussion did not improve their group 



performance. Also, in the IG treatment, 
the normahzed gain of 0.39 in the group 
work is clearly not a "test-retest" effect be- 
cause considering the treatment samples to 
be equivalent, wc can compare the indi- 
vidual performance on IG treatment (55%) 
with the group performance of GI treat- 
ment (71.7%) which shows a gain of 0.37 
(indistinguishable from 0.39). Therefore, 
most of the following discussion will focus 
on the IG treatment. 
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70.3% 
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55% 


72.5% 


0.39 



Table 1: The average individual (I) and 
group (G) scores for the GI and IG treat- 
ments and, g for the IG treatment. 

But before moving on, we note some in- 
teresting trends in the time students took 
during the two successive testings in the 
IG and GI treatments. In the GI treat- 
ment, during the group work only, and in 
the IG treatment, both during the individ- 
ual and group work, students took roughly 
the same amount of time. One the other 
hand, students working individually after 
the group work in the GI treatment on an 
average took roughly one third of the time 
spend on group work. It appears that in 
the IG treatment, despite having worked 
on the problems individually, students were 
willing to spend the time discussing the 
same test because they found peer collab- 
oration useful. However, in the GI treat- 
ment, after having discussed the test with 
peers, students were reasonably sure about 
their thoughts and did not consider it nec- 
essary to brood over the problems again. 

2.1 Evidence for co-construction 

Although there is no consensus in the re- 
search literature on the definition of "co- 
construction", we use the term here to 
denote cases where neither student alone 
chose the correct response but did as a 
group. Co-construction can occur for 
several reasons. For example, if the 
group members chose different incorrect re- 
sponses, they will have to explain their rea- 
soning to each other. This can unravel 
problems in their initial logic and comple- 
mentary information provided by peers can 
help them converge on the correct solution. 



Even in cases where both students have the 
same incorrect response, co-construction 
can occur if students are unsure about their 
initial response and are willing to discuss 
their apprehensions with peers. Important 
clues provided by peers during the discus- 
sion can trigger recall of relevant concepts 
and can help the group co-construct. One 
attractive feature of peer collaboration is 
that since both peers have recently gone 
through similar difficulties in assimilating 
and accommodating the new material, they 
can often relate to each other's difficulties 
more easily than the instructor. The in- 
structors' extensive experience can often 
make a concept so obvious and automatic 
that they may not comprehend why stu- 
dents are misinterpreting various aspect of 
a concept or finding them confusing. 

Table 2 displays the percentage of the 
overall cases where neither, one, or both 
group members had the correct response 
individually and how they changed dur- 
ing the group work (for IG treatment). It 
shows the possibility for co-construction in 
29% of the cases where neither group mem- 
bers were correct individually. 

Individual Response Group Response 

Incorrect I Correct 

neither correct | 26% | 71%) | 29% 
one correct 38^ 22% 78^ 
both correct 36% Wo 100% 



Table 2: Distribution of the average group 
response for various combinations of indi- 
vidual response of group members in the IG 
treatment. The second column displays the 
percentage of the overall cases where nei- 
ther, one, or both group members had the 
correct response individually. 

To ensure that the cases in which both 
students individually chose the incorrect 
response but the group chose the correct 
response is not due to "just guessing", we 
analyze the first row of Table 2 in detail. In 
Table 3, we subdivide this row based upon 
whether both partners had the same or dif- 
ferent incorrect responses and if the group 
response was one of the original incorrect 
response or a third incorrect response. 

Table 3 shows that in 22% of the cases 
when both group members had the same 
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same incorrect (42%) 


22% 


70% 


8% 


different incorrect (58%) 


34% 


52% 


14% 



Table 3: Distribution of the average group 
response for cases where both members had 
the same or different incorrect individual 
response. 1, and O' refer to "correct", 
"one of the original incorrect" and "a third 
incorrect" response respectively. 



incorrect response, and in 34% of the cases 
when both had different incorrect response, 
the group response was correct. In com- 
parison, a relatively small frequency of a 
third incorrect group response that was not 
originally selected by either members sug- 
gests that students were not "just guess- 
ing" (see Table 3). Although we did not 
conduct formal interviews with students af- 
ter they worked in groups, we briefly dis- 
cussed the aspects of the group work they 
found helpful with several students. Most 
students admitted that they got useful in- 
sights about various electricity and mag- 
netism concepts by discussing them with 
peers. Students frequently noted (often 
with examples) that they had difficulty in- 
terpreting the problems alone but interpre- 
tation became easier with a friend. They 
also said that talking to peers forced them 
to think harder about the concepts, find 
fault with their initial reasoning, and re- 
minded them of concepts they had diffi- 
culty recalling on their own. Qualitative 
observations show that students were more 
likely to draw field lines, write equations or 
scribble on their exams in the group work 
than in the individual work. 

2.2 Negative impact of grouping? 

Table 2 (second row) shows that out of all 
the cases in which one group member indi- 
vidually had the correct response and the 
other had an incorrect response, 78% of 
the group responses were correct. The fact 
that 22% of such cases resulted in an incor- 
rect group response is not very troublesome 
because it was an unguided peer discus- 
sion. It can happen if students who indi- 
vidually chose the correct response are not 
very confident and cannot defend or jus- 
tify their response. The fact that a major- 



ity of such cases resulted in correct group 
responses suggests that students who in- 
dividually chose the correct response were 
generally more confident and were able to 
justify their choice. It also suggests that 
students who picked an incorrect item in- 
dividually were less sure of their choice and 
were willing to listen to their peers. As 
will be discussed in the next section, group 
work always results in individual gain. 
2.3 Individual gain and retention 
The individual performance in the GI treat- 
ment was much superior (70.3%) compared 
to the IG treatment (55%). One can hy- 
pothesize that students could immediately 
recall the group responses for all the 32 test 
items in the GI treatment and their supe- 
rior performance does not reflect the effec- 
tiveness of group work with regard to the 
retention of the concepts discussed. Sim- 
ilarly, in the IG treatment, the superior 
group performance compared to the indi- 
vidual performance is due to a large num- 
ber of cases where the group member with 
the correct response was able to convince 
the one with the incorrect response. It 
does not guarantee that students will retain 
what they learned in the group work. To 
investigate the impact of group interaction 
on retention, two weeks after the IG and GI 
treatments, all students individually took 
the CSEM test again. In view of the fact 
that students earlier went through either 
the IG or GI treatments, we now relabel the 
treatments IGI and GIL We note that stu- 
dents did not know that they will be tak- 
ing the same test again. Although it would 
have been better to use different, equally 
reliable test for assessing similar concepts; 
unfortunately, none was available. 

In the IGI treatment, the average score 
for the second individual testing was 74%, 
a gain of 0.42 compared to the initial indi- 
vidual score of 55%. A detailed comparison 
of the group vs. the second individual test 
scores shows that 81% of the overall indi- 
vidual responses chosen by the members of 
a particular group were the same as the 
group responses. Out of the 19% individ- 
ual responses that were different from the 
group, roughly 11% went from incorrect to 
correct while 8% went from correct to in- 
correct. There are two competing effects: 
the fact that students forgot some group 
responses over time and the fact that they 
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Table 4: (a) [left] The average initial indi- 
vidual score, and (b) [right] the normalized 
gain for the nine types of pairs. The top 
row of the Table refers to the performance 
of A students for different kinds of pair- 
ings: (A A), (A B) and (A C). Similarly, 
the middle and bottom row refer to the per- 
formance of B and C students respectively. 



had time to study for the second individual 
test; a large fraction of group response was 
retained even after two weeks. The corre- 
sponding numbers in the Gil treatment are 
virtually the same. 
2.4 Effective pairing 
To learn about the type of pairings that will 
optimize the overall gain for this concep- 
tual test, we divided the 74 students in the 
IGI treatment into three equal types: high 
(A), middle (B), and low (C), based upon 
their initial individual score on CSEM. In 
Tables 4a and 4b, we show the average ini- 
tial individual score (left) and the normal- 
ized gain in the second individual testing 
(right) for all the nine possible pairs. In 
comparison, for all 74 students together, 
the average initial individual score was 55% 
and the average individual gain was 0.42. 

Table 4 shows that although all students 
in the IGI treatment gained in the sec- 
ond individual testing two weeks later com- 
pared to the first, the gain matrix is not 
symmetric. For example, the gain of A stu- 
dent due to the interaction with C (0.29) 
is not the same as that of C due to A 
(0.41) in the (A C) pairing. It is striking 
that the gain of A and B students is sig- 
nificantly lower when they paired with C 
students. 'A' students have similar gains 
whether they paired with A or B students 
while B students benefit more from pair- 
ing with A than with another B student. 
Interestingly, the gain of C students (low- 
est third) is virtually the same regardless 
of who they paired with (bottom row of 
Table 4). One hypothesis is that while 
pairing with a higher achievement student 
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Table 5: (a) [left] The average group score, 
and (b) [right] the second individual test 
score for the nine types of pairs. 



helped C students do well in the group 
test, they did not retain all of the concepts 
because they were subdued by the higher 
achieving student and did not participate 
actively in the discussions. Thus, the op- 
portunity to learn from the higher achiev- 
ing student may have been outweighed by 
the inability of C students to participate 
fully in the discussions and process the in- 
formation at the rate discussed by the other 
student. On the other hand, in the (C 
C) pairing, both students had comparable 
but at least some complementary knowl- 
edge and both actively participated in the 
discussions. The evidence for this hypothe- 
sis comes from the comparison of the aver- 
age group (left) and second individual test 
scores (right) for each of the pairings as 
shown in Tables 5a and 5b. 

A comparison of Tables 5a and 5b shows 
that the average individual score of C stu- 
dents in the (A C) pairing after two weeks 
(61%) is lower compared to their group 
score (72%). Also, a comparison of Tables 
4a and 5a shows that A students did not 
benefit from interactions with C students 
and the average initial individual score for 
A students and the group score for the (A 
C) pairing arc the same. Similar compar- 
isons for B students shows that although 
they benefitted from all types of interac- 
tions, their gain improved as they inter- 
acted with higher achieving students. It 
appears that at least for this conceptual 
test, the pairing that helps maximize the 
overall gain is one that only has (A B) and 
(C C) pairs. It will be useful to investigate 
the extent to which this result is univer- 
sal, i.e., whether two peers collaborating on 
conceptual tests show highest overall gains 
when the high and middle achievement stu- 
dents are paired, and the low achievement 
students are paired with each other. 



